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21 Computational aspects of resilient data extraction from semistructured 
1^ sources (extended abstract) 

Hasan Davulcu , Guizhen Yang , Michael Kifer , I. V. Rannakrishnan 

Proceedings of the nineteenth ACM SIGMOD-SIGACT-SIGART symposium on Principles 

of database systems May 2000 

Automatic data extraction from semistructured sources such as HTML pages is rapidly growing 
into a problem of significant importance, spurred by the growing popularity of the so called 
"shopbots" that enable end users to compare prices of goods and other services at various web 
sites without having to manually browse and fill out forms at each one of these sites. The main 
problem one has to contend with when designing data extraction techniques Is that the 
contents of ... 
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22 Indexing and retrieval of scientific literature 80% 

Steve Lawrence , Kurt Bollacker , C. Lee Giles 

Proceedings of the eighth international conference on Information and knowledge 
management November 1999 

The web has greatly improved access to scientific literature. However, scientific articles on the 
web are largely disorganized, with research articles being spread across archive sites, 
institution sites, journal sites, and researcher homepages. No index covers all of the available 
literature, and the major web search engines typically do not index the content of 
Postscript/PDF documents at all. This paper discusses the creation of digital libraries of 
scientific literature on the web, inci ... 



23 Training a selection function for extraction 80% 

Chin-Yew Lin 

— Proceedings of the eighth international conference on Information and knowledge 
management November 1999 

In this paper we compare performance of several heuristics in generating informative 
generic/query-oriented extracts for newspaper articles in order to learn how topic prominence 
affects the performance of each heuristic. We study how different query types can affect the 
performance of each heuristic and discuss the possibility of using machine learning algorithms 
to automatically learn good combination functions to combine several heuristics. We also 
briefly describe the design, implementa ... 
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24 Conference review 80% 

Stuart Lowry 

— intelligence September 1999 
Volume 10 Issue 3 

25 Privacy interfaces for information nnanagennent 80% 

Tessa Lau , Oren Etzioni , Daniel S. Weld 
— ' Communications of the ACM October 1999 
Volume 42 Issue 10 

26 A persona! news agent tliat talks, learns and explains 80% 

Daniel Billsus , Michael J. PazzanI 

— Proceedings of the third annual conference on Autonomous Agents April 1999 

27 WebMate 80% 

Liren Chen , Katia Sycara 

— Proceedings of the second international conference on Autonomous agents May 1998 

28 Experiences with selecting search engines using nnetasearch 80% 
Daniel Dreilinger , Adele E. Howe 

— ACM Transactions on Information Systems (TOIS) July 1997 
Volume 15 Issue 3 

Search engines are among the most useful and high-profile resources on the Internet. The 
problem of finding information on the Internet has been replaced with the problem of knowing 
where search engines are, what they are designed to retrieve, and how to use them. This 
article describes and evaluates SavvySearch, a metasearch engine designed to intelligently 
select and interface with multiple remote search engines. The primary metasearch issue 
examined is the importance of carefully selecti ... 

29 Strategic directions in database systems— breaking out of the box 80% 

Avi Silberschatz , Stan Zdonik 

— ACM Computing Surveys (CSUR) December 1996 
Volume 28 Issue 4 

30 Session 9A: applications in commerce: CoMMA 80% 

D| Federico Bergenti , Agostino Poggi , Giovanni Rimassa , Paola Turci 

Proceedings of the first international joint conference on Autonomous agents and 
multiagent systems: part 3 July 2002 

In this paper, we present CoMMA (Corporate Memory Management through Agents), an open, 
agent-based system for the management of a corporate memory that was realized integrating 
several emerging technologies: agent technology, knowledge modeling, XML technology, 
information retrieval and machine learning techniques. In particular, the system has been 
realized to help users in the management of an organization corporate memory and in 
particular to facilitate the creation, dissemination, transmissi ... 

31 Corpus Linguistics: Mining the web to create minority language corpora 80% 

Rayid Ghanl , Rosie Jones , Dunja Mladenic 

Proceedings of the tenth international conference on Information and knowledge 

management October 2001 

The Web is a valuable source of language specific resources but the process of collecting, 
organizing and utilizing these resources is difficult. We describe CorpusBuilder, an approach for 
automatically generating Web-search queries for collecting documents in a minority language. 
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It differs from pseuc^^elevance feedback in that retrieved d^^ents are labeled by an 
automatic language classifier as relevant or irrelevant, and this feedback is used to generate 
new queries. We experiment with var ... 



32 PVA 80% 

Chien Chin Chen , Meng Chang Chen , Yeali Sun 

Proceedings of the seventh ACM SIGKDD international conference on Knowledge 
discovery and data mining August 2001 

In this paper, we present PVA, an adaptive personal view information agent system to track, 
learn and manage, user's interests in Internet documents. When user's interests change, PVA, 
in not only the contents, but also in the structure of user profile, is modified to adapt to the 
changes. Experimental results show that modulating the structure of user profile does increase 
the accuracy of personalization systems. 



33 Pll: Finding scientific papers with homepagesearch and MOPS 80% 

Gerd Hoff , Martin Mundhenk 

Annual ACM Conference on Systems Documentation October 2001 

The fast dissemination of new research results on the world-wide web poses new challenges 
for search engines. In this paper we describe a new approach to seek scientific papers relevant 
to a pre-defined research area. Different from other approaches, we do not search for web 
pages which contain certain keywords, but we search for web pages which are created by 
scientists who are active in the research area under consideration. The names of these 
scientists are obtained from the DBLP server [9]. ... 



34 News 80% 
Douglas Blank 
intelligence June 2001 
Volume 12 Issue 2 



35 Topical locality in the Web 80% 

Brian D. Davison 

— Proceedings of the 23rd annual international ACM SIGIR conference on Research and 
development in information retrieval July 2000 



36 Extracting sentence segments for text summarization 80% 

Cfi Wesley T. Chuang , Jihoon Yang 

— Proceedings of the 23rd annual international ACM SIGIR conference on Research and 
development in information retrieval July 2000 



37 Knowledge-based metadata extraction from PostScript files 80% 

Giovanni Giuffrida , Eddie C. Shek , Jihoon Yang 
— Proceedings of the fifth ACM conference on Digital libraries June 2000 

The automatic document metadata extraction process Is animportant task in a world where 
thousands of documents are just one' 'click" away. Thus, powerful indices are necessary to 
support effective retrieval. The upcoming XML standard represents an important step in this 
direction as itssemistructuredrepresentation conveys document metadata together with the 
text of the document. For example, retrieval of scientific papers by authors or affiliations would 
be a s ... 



38 A learning agent for wireless news access 80% 

Cft Daniel Billsus , Michael J. Pazzani , James Chen 

Proceedings of the 5th international conference on Intelligent user interfaces January 
2000 

We describe a user interface for wireless Information devices, specifically designed to facilitate 
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learnfng about users^ndividual interests in daily news stories^Jser feedback is collected 

unobtrusively to form the basis for a content-based machine learning algorithm. As a result, 
the described system can adapt to users' individual interests, reduce the amount of 
information that needs to be transmitted, and help users access relevant information with 
minimal effort. 



39 Learning routing queries in a query zone 

Amit Singhal , Mandar Mitra , Chris Buckley 

ACM SIGIR Forum , Proceedings of the 20th annual international ACM SIGIR conference 
on Research and development in information retrieval July 1997 
Volume 31 Issue SI 



40 The Datacyde architecture 

T. F. Bowen , G. Gopal , G. Herman , T. Hickey , K. C. Lee , W. H. Mansfield , J. Raitz , A. Weinrib 
Communications of the ACM December 1992 
Volume 35 Issue 12 
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